DO NOT MERGE: Visualisation of my changes #14

Tom-Newton · 2024-01-18T17:10:16Z

No description provided.

* Working with Spark 3.5.0 * Use SparkSession instead of deprecated SQLContext on the python side * Correction in init * Update tests for new spark session configuration

* Set spark log level to error * Run all tests with and without codegen * Re-name * Update tests to include product of all the things that were originally tested * Tests with no backward rows to match with * Add tests for empty dataframes * Add tests for preserving nulls in the input * Add tests for preserving nulls when tolerance is used * Fix codegen__left_join_some_rows_have_no_backwards_match * Fix interpreted__left_join_right_dataframe_is_empty * Fix interpretted__inner_join_searching_backward_for_matches_with_tolerance * Tidy * Remove debug `.show()`s

* Start tests * Valid tests * Fix codegen version * Tests with duplicate rows * Delete unnecessary code. It was a legacy of the normal join's multi matching behaviour * Working inner joins with more code removed and tolerance applied inside main search loop * Working inner join * Tidy * Very minor clean ups * Remove some unnecessary branching * Fix schema assertion for left_join_duplicate_join_keys * Remove unneeded bound condition * Fix schema assertions for nulls in join keys * Update comments and rename variables * Auto-format * Remove slightly misleading comment * Remove completed TODO comment * More comment adjustments

* Remove `condition` from exec code * Fail physical planning if there are non-equi conditions * Add test * Throw an exception

* Initial testcase * Correct test * Attempt to create a new version of normal `.join` * Delete mess * Switch to new spark extensions framework * First explicit pitJoin compiles * In progress: create new entrypoint to PIT join * Working test * All scala tests working * Create python side joinPIT to replace context * Rename some paramaters * Update parameter order in tests * Orgnise imports * Tidy

* Make spark session an explicit argument * Update Readme

Tom-Newton added 12 commits January 8, 2024 20:40

Add gitignores for metals (#2)

cd6d44b

Update for spark 3.5.0 (#3)

3f84a30

* Working with Spark 3.5.0 * Use SparkSession instead of deprecated SQLContext on the python side * Correction in init * Update tests for new spark session configuration

Update README to reflect new way of initialising the context (#7)

e297169

Update python spark to 3.5.0 (#8)

8c9256f

Bump versions (#9)

832dbe8

Fully remove non equi non pit condition (#10)

67a454e

* Remove `condition` from exec code * Fail physical planning if there are non-equi conditions * Add test * Throw an exception

Update Readme (#12)

d098f14

* Make spark session an explicit argument * Update Readme

Bump versions (#13)

452dee7

Avoid checking in proprietary jars

8fa2997

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE: Visualisation of my changes #14

DO NOT MERGE: Visualisation of my changes #14

Tom-Newton commented Jan 18, 2024

DO NOT MERGE: Visualisation of my changes #14

Are you sure you want to change the base?

DO NOT MERGE: Visualisation of my changes #14

Conversation

Tom-Newton commented Jan 18, 2024